NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Equation Attention Relationship Network (EARN) : A Geometric Deep Metric Framework for Learning Similar Math Expression Embedding

https://doi.org/10.1109/ICPR48806.2021.9412619

Ahmed, Saleem; Davila, Kenny; Setlur, Srirangaraj; Govindaraju, Venu (May 2021, 2020 25th International Conference on Pattern Recognition (ICPR))
null (Ed.)
Representational Learning in the form of high dimensional embeddings have been used for multiple pattern recognition applications. There has been a significant interest in building embedding based systems for learning representations in the mathematical domain. At the same time, retrieval of structured information such as mathematical expressions is an important need for modern IR systems. In this work, our motivation is to introduce a robust framework for learning representations for similarity based retrieval of mathematical expressions. Given a query by example, the embedding can find the closest matching expression as a function of euclidean distance between them. We leverage recent advancements in image-based and graph-based deep learning algorithms to learn our similarity embeddings. We do this first, by using unimodal encoders in graph space and image space and then, a multi-modal combination of the same. To overcome the lack of training data, we force the networks to learn a deep metric using triplets generated with a heuristic scoring function. We also adopt a custom strategy for mining hard samples to train our neural networks. Our system produces rankings similar to those generated by the original scoring function, but using only a fraction of the time. Our results establish the viability of using such a multi-modal embedding for this task.
more » « less
Full Text Available
Skeleton-Based Methods for Speaker Action Classification on Lecture Videos

https://doi.org/10.1007/978-3-030-68799-1_18

Xu, Fei; Davila, Kenny; Setlur, Srirangaraj; Govindaraju, Venu. (March 2021, Lecture notes in computer science)
Del Bimbo, Alberto; Cucchiara, Rita; Sclaroff, Stan; Farinella, Giovanni M; Mei, Tao; Bertini, Marco; Escalante, Hugo J; Vezzani, Roberto. (Ed.)
The volume of online lecture videos is growing at a frenetic pace. This has led to an increased focus on methods for automated lecture video analysis to make these resources more accessible. These methods consider multiple information channels including the actions of the lecture speaker. In this work, we analyze two methods that use spatio-temporal features of the speaker skeleton for action classification in lecture videos. The first method is the AM Pose model which is based on Random Forests with motion-based features. The second is a state-of-the-art action classifier based on a two-stream adaptive graph convolutional network (2S-AGCN) that uses features of both joints and bones of the speaker skeleton. Each video is divided into fixed-length temporal segments. Then, the speaker skeleton is estimated on every frame in order to build a representation for each segment for further classification. Our experiments used the AccessMath dataset and a novel extension which will be publicly released. We compared four state-of-the-art pose estimators: OpenPose, Deep High Resolution, AlphaPose and Detectron2. We found that AlphaPose is the most robust to the encoding noise found in online videos. We also observed that 2S-AGCN outperforms the AM Pose model by using the right domain adaptations.
more » « less
Full Text Available
The MathDeck Formula Editor: Interactive Formula Entry Combining LaTeX , Structure Editing, and Search

https://doi.org/10.1145/3411763.3451564

Diaz, Yancarlos; Nishizawa, Gavin; Mansouri, Behrooz; Davila, Kenny; Zanibbi, Richard (May 2021, Proc. CHI 2021)

Full Text Available
Automated Whiteboard Lecture Video Summarization by Content Region Detection and Representation

https://doi.org/10.1109/ICPR48806.2021.9412386

Kota, Bhargava Urala; Stone, Alexander; Davila, Kenny; Setlur, Srirangaraj; Govindaraju, Venu (May 2021, 2020 25th International Conference on Pattern Recognition (ICPR))
null (Ed.)
Lecture videos are rapidly becoming an invaluable source of information for students across the globe. Given the large number of online courses currently available, it is important to condense the information within these videos into a compact yet representative summary that can be used for search-based applications. We propose a framework to summarize whiteboard lecture videos by finding feature representations of detected handwritten content regions to determine unique content. We investigate multi-scale histogram of gradients and embeddings from deep metric learning for feature representation. We explicitly handle occluded, growing and disappearing handwritten content. Our method is capable of producing two kinds of lecture video summaries - the unique regions themselves or so-called key content and keyframes (which contain all unique content in a video segment). We use weighted spatio-temporal conflict minimization to segment the lecture and produce keyframes from detected regions and features. We evaluate both types of summaries and find that we obtain state-of-the-art peformance in terms of number of summary keyframes while our unique content recall and precision are comparable to state-of-the-art.
more » « less
Full Text Available
FCN-LectureNet: Extractive Summarization of Whiteboard and Chalkboard Lecture Videos

https://doi.org/10.1109/ACCESS.2021.3099427

Davila, Kenny; Xu, Fei; Setlur, Srirangaraj; Govindaraju, Venu (January 2021, IEEE Access)
null (Ed.)
Full Text Available
ICPR 2020 - Competition on Harvesting Raw Tables from Infographics

https://doi.org/10.1007/978-3-030-68793-9_27

Davila, Kenny; Tensmeyer, Chris; Shekhar, Sumit; Singh, Hrituraj; Setlur, Srirangaraj; Govindaraju, Venu. (February 2021, Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021)
Del Bimbo, Alberto; Cucchiara, Rita; Sclaroff, Stan; Farinella, Giovanni M; Mei, Tao; Bertini, Marc; Escalante, Hugo J; Vezzani, Roberto. (Ed.)
This work summarizes the results of the second Competition on Harvesting Raw Tables from Infographics (ICPR 2020 CHART-Infographics). Chart Recognition is difficult and multifaceted, so for this competition we divide the process into the following tasks: Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided two sets of datasets for training and evaluation of the participant submissions. The first set is based on synthetic charts (Adobe Synth) generated from real data sources using matplotlib. The second one is based on manually annotated charts extracted from the Open Access section of the PubMed Central (UB PMC). More than 25 teams registered out of which 7 submitted results for different tasks of the competition. While results on synthetic data are near perfect at times, the same models still have room to improve when it comes to data extraction from real charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.
more » « less
Full Text Available
Chart Mining: A Survey of Methods for Automated Chart Analysis

https://doi.org/10.1109/TPAMI.2020.2992028

Davila, Kenny; Setlur, Srirangaraj; Doermann, David; Bhargava, Urala Kota; Govindaraju, Venu (May 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence)
null (Ed.)
Charts are useful communication tools for the presentation of data in a visually appealing format that facilitates comprehension. There have been many studies dedicated to chart mining, which refers to the process of automatic detection, extraction and analysis of charts to reproduce the tabular data that was originally used to create them. By allowing access to data which might not be available in other formats, chart mining facilitates the creation of many downstream applications. This paper presents a comprehensive survey of approaches across all components of the automated chart mining pipeline such as (i) automated extraction of charts from documents; (ii) processing of multi-panel charts; (iii) automatic image classifiers to collect chart images at scale; (iv) automated extraction of data from each chart image, for popular chart types as well as selected specialized classes; (v) applications of chart mining; and (vi) datasets for training and evaluation, and the methods that were used to build them. Finally, we summarize the main trends found in the literature and provide pointers to areas for further research in chart mining.
more » « less
Full Text Available
Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information

https://doi.org/10.1109/ICDAR.2019.00171

Xu, Fei; Davila, Kenny; Setlur, Srirangaraj; Govindaraju, Venu (September 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR))

Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.
more » « less
Full Text Available
Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes

https://doi.org/10.1109/ICFHR-2018.2018.00018

Davila, Kenny; Zanibbi, Richard (August 2018, Proc. International Conference on Frontiers in Handwriting Recognition)

o fill a gap in online educational tools, we are working to support search in lecture videos using formulas from lecture notes and vice versa. We use an existing system to convert single-shot lecture videos to keyframe images that capture whiteboard contents along with the times they appear. We train classifiers for handwritten symbols using the CROHME dataset, and for LATEX symbols using generated images. Symbols detected in video keyframes and LATEX formula images are indexed using Line-of-Sight graphs. For search, we lookup pairs of symbols that can 'see' each other, and connected pairs are merged to identify the largest match within each indexed image. We rank matches using symbol class probabilities and angles between symbol pairs. We demonstrate how our method effectively locates formulas between typeset and handwritten images using a set of linear algebra lectures. By combining our search engine Tangent-V) with temporal keyframe metadata, we are able to navigate to where a query formula in LATEX is first handwritten in a lecture video. Our system is available as open-source. For other domains, only the OCR modules require updating.
more » « less
Full Text Available
Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

https://doi.org/10.1007/s10032-019-00327-y

Urala Kota, Bhargava; Davila, Kenny; Stone, Alexander; Setlur, Srirangaraj; Govindaraju, Venu (June 2019, International Journal on Document Analysis and Recognition (IJDAR))

Full Text Available

« Prev Next »

Search for: All records